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WHAT IS CLAIMED IS : 

1. A method for improving appearance of captured bilevel image data, 

comprising: 

receiving a degraded input bilevel image; 

connecting dark pixels in the input image to adjacent dark pixels to form 
connected components comprising a set of dark pixels connected either diagonally or 
orthogonally and surrounded by white pixels; 

performing initial clustering of individual connected components; 

determining a "most likely" cluster representative by use of a probabilistic 
model of the scanner used for scanning; and 

assembling the sets by substituting the "most-likely" cluster representative for 
each family member of each cluster set to form an output image. 

2. The method of claim 1, wherein the step of performing initial clustering 
includes pair-wise matching connected components and determining a match if the 
pair are within a certain threshold of matching to form cluster sets, each with one or 
more family members formed of individual connected components; and 

the step of determining a "most likely" cluster representative for each cluster 
set uses a optimization procedure in which at least one iteration of pixel flipping is 
performed. 

3. The method of claim 2, wherein the optimization procedure is a hill- 
climbing optimization procedure. 

4. The method of claim 2, wherein an initial representative is determined by 
finding a translation x of each family member that maximizes the probability that the 
family member is a given original image to obtain a higher resolution histogram. 
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5. The method of claim 2, wherein an initial representative is determined by 
summing scans of each maximized family member to form a double-resolution 
histogram. 

6. The method of claim 2, wherein multiple iterations of pixel flipping are 
performed, with a first iteration of pixel flipping being performed with a first 
threshold and 

a subsequent iteration being performed with a more progressive threshold being 
applied until a maximized probability is attained. 

7. The method of claim 1, further comprising a step of reclustering by 
comparing cluster representatives of clusters. 

8. The method of claim 7, wherein the step of reclustering reclusters the cluster 
sets by comparing cluster representatives of smaller cluster sets with cluster 
representatives of larger cluster sets and merging the smaller cluster set into the larger 
cluster set when a normalized probability exceeds a default threshold. 

9. The method of claim 2, wherein the initial clustering includes: 
forming a bounding box around each connected component A and B; 
aligning connected components A and B to each other by aligning centers of 

their bounding boxes; and 

determining a match of the connected components A and B if: 

|4 -\A^B\< f(\8A\) and |B| - |b n a| < f{\8B\) 

where: 

|A| denotes the number of black pixels in A; 

Ar\B denotes the pixels that are black in both A and B; 

A denotes a one-pixel dilation of the black pixels in A; 
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dA denotes the boundary of A, that is, the set of black pixels with white 
neighbors; and 

/(n) equals 0 for n < 3, and ,025n for n > 7, and interpolates between 

these two lines for 3 < n < 7. 

10. The method of claim 9, further comprising stopping of a match of one of 

A\ B or B\ A includes a set of more than three pixels that can be enclosed by a 3x3 
box. 

1 1 . The method of claim 9, wherein the probability that a pixel in row i and 
column / has value Ay (black or white) given B and x is determined as: 

P ^' / ' jB? ^"{l- j p(w y (r)) if A tJ iswhite. ? 

where t represents a translation of a sensor grid used to capture the input 
image with respect to a given original image region B; 

wy (x) denotes the weight of black in the given original image region B seen 
by the sensor grid in row i and column j based on a point spread function of the sensor 
grid; and 

p(Wij(x)) denotes a determined probability that the sensor grid's output pixel 
would be black. 

12. The method of claim 1 1, wherein individual pixel probabilities within a 
connected component A are multiplied to obtain a probability P[A | B,x] that the 
connected component A is a capture of the given original image region B at 
translation x as: 

P[A|B,x] = n P kl B ' T l 

u 

13. The method of claim 12, wherein t is optimized over all possible 
translations as 
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p[A|B]=maxP[A|B,<|. 
x 

14. The method of claim 13, wherein connected component A and given 
original image region B are prealigned by the centroids of their respective bounding 
boxes. 

15. The method of claim 14, wherein optimization of x is limited to the nine 
shortest vectors in the lattice of the bounding box. 

16. The method of claim 13, wherein the probability of an entire cluster set is 
computed by multiplying the probabilities of each individual family member using 

p[c|B] = nP[A|Bj 

AeC 

and the initial "most likely" cluster representative is the one that maximizes P[C|B]. 

17. The method of claim 1, further comprising using chain codes to define a 
priori probabilities to find the cluster representative. 

18. The method of claim 17, wherein the a priori probability is computed by 
determining the product of transition probabilities around all connected components 
of the cluster representative to attain a value B 1 and the "most likely" representative is 
the B 1 with a maximum PfCIB 1 ]. 

19. The method of claim 7, wherein the step of reclustering is performed by 
normalizing probabilities using 

NtAilBjl^CPtAilBj]) 1 ^, 

where p is the number of pixels in the connected component Aj (aligned with the 
representative image Bj) that are within a sensor disk's radius of a black pixel in either 
the connected component Ai or the representative image Bj. 

20. The method of claim 19, wherein the threshold is 0.70. 



26 

21. The method of claim 19, wherein the threshold is 0.68 when the smaller 
cluster set is a singleton and the larger cluster set has at least four family members. 

22. The method of claim 7, wherein reclustering stops when the larger cluster 
sets has fewer than four family members. 

23. The method of claim 1, further comprising a step of breaking run-through 
letters by computing a sequence of breakable positions of singleton cluster 
representatives and comparing each breakable position portion with other cluster 
representatives. 

24. The method of claim 23, further comprising a step of merging a breakable 
position portion with a cluster set when the comparison indicates a sufficient match. 

25. The method of claim 1, wherein the step of assembly includes aligning 
centers of bounding boxes and testing double-resolution translations to recompute 
alignment and determine the most likely position of the cluster representative. 



